A Similarity Measurement with Entropy-Based Weighting for Clustering Mixed Numerical and Categorical Datasets

نویسندگان

چکیده

Many mixed datasets with both numerical and categorical attributes have been collected in various fields, including medicine, biology, etc. Designing appropriate similarity measurements plays an important role clustering these datasets. traditional treat equally when measuring the similarity. However, different may contribute differently as amount of information they contained could vary a lot. In this paper, we propose measurement entropy-based weighting for The data are first transformed into by automatic categorization technique. Then, strategy is applied to denote importances attributes. We incorporate proposed iterative algorithm, extensive experiments show that algorithm outperforms OCIL K-Prototype methods 2.13% 4.28% improvements, respectively, terms accuracy on six from UCI.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering the Mixed Numerical and Categorical Datasets using Similarity Weight and Filter Method

Clustering is a challenging task in data mining technique. The aim of clustering is to group the similar data into number of clusters. Various clustering algorithms have been developed to group data into clusters. However, these clustering algorithms work effectively either on pure numeric data or on pure categorical data, most of them perform poorly on mixed categorical and numerical data type...

متن کامل

Clustering the Mixed Numerical and Categorical Dataset using Similarity Weight and Filter Method

Clustering is a challenging task in data mining technique. The aim of clustering is to group the similar data into number of clusters. Various clustering algorithms have been developed to group data into clusters. However, these clustering algorithms work effectively either on pure numeric data or on pure categorical data, most of them perform poorly on mixed categorical and numerical data type...

متن کامل

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Bi-level clustering of mixed categorical and numerical biomedical data

Biomedical data sets often have mixed categorical and numerical types, where the former represent semantic information on the objects and the latter represent experimental results. We present the BILCOM algorithm for 'Bi-Level Clustering of Mixed categorical and numerical data types'. BILCOM performs a pseudo-Bayesian process, where the prior is categorical clustering. BILCOM partitions biomedi...

متن کامل

Holo-Entropy Based Categorical Data Hierarchical Clustering

Clustering high-dimensional data is a challenging task in data mining, and clustering high-dimensional categorical data is even more challenging because it is more difficult to measure the similarity between categorical objects. Most algorithms assume feature independence when computing similarity between data objects, or make use of computationally demanding techniques such as PCA for numerica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Algorithms

سال: 2021

ISSN: ['1999-4893']

DOI: https://doi.org/10.3390/a14060184